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1 Introduction 



The estimation of the quadratic (co-)variation of semimartingales is of large interest in statistics and finan- 
cial econometrics. Especially, statistical models taking market microstructure frictions into account have 
attracted a lot of attention in recent years. Inspired by empirical studies of the characteristics of high- 
frequency financial data, a prominent approach is to describe asset prices as a superposition of a discretely 
sampled semimartingale with an independent additive noise component. 

The finding that observations of a Brownian motion wit h noise on a discretely arranged grid possesses the 



LAN-property in Le Cam's sense with the rate n~^/'^ by Gloter & Jacod ( 200l| , instead of the usual n^^^^ 
rate in the absence of noise, has provided the optimal rate and a parametric efficiency bound for the asymp- 
totic variance as a benchmark for this estimation problem. Interestingly, the nuisance quantity, namely the 
noise level, can be estimated with the usual faster rate in this model in contrast to the parameter of interest. 
This is caused by observation errors with non-decreasing variances perturbing diffusion increments of or- 
der rT^/'^. These features carry over to the estimation problem of covariation in a multidimensional setting 
as has been shown in Bibinger (201 la). 

The key role of quantifying integrated (co-)volatilities in portfolio optimization and risk management has 
stimulated an increasing interest in estimation methods for these models starting with |Ait-Sahalia et al.\ 



(j2005 ) and [Zhang et al. (20051. Subsequently three nonparametric approaches for integrated volatility es- 
timation have been suggested, the multi-scale realize d volatility by jZhang ( 2()()6[l, a pre-average strategy 
by Jacod et al. ( 2009 | l and the realized kernels from Barndorff-Nielsen et al. ( 2008| l. All estimators are 
based on quadratic forms of the observations and depend on a globally chosen tuning parameter For that 
reason, when ignoring the treatment of end-effects, all three share a similar asymptotic behavior They 
attain the optimal rate, but cannot be asymptotically efficient for time-varying volatility functions. StiU, 
several robustness results to more realistic models incorporating non-i. i. d. noise and stochastic volatilities 
with leverage have been established and make these approaches quite attractive. 
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An alternative approach for the estimation of the quadratic variation arising in |A"it-SahaUa et ar|p005| l 



from the parametric point of view is based on the MLE for this model. It turned out in Xiu ( 2010| l that the 



MLE for integrated volatility can cope with a nonparametric volatility specification. This quasi-maximum 
likelihood estimator (QMLE) also attains the optimal rate. Asymptotic efficiency, however, is achieved 
only in the parametric setup with constant volatility. In ReiB ( 201 U an asymptotically efficient estima- 



tor based on spectral theory and localized MLEs for asymptotically shrinking blocks has been constructed. 
The idea stems from an asymptotic equivalence result in the spirit of |Grama & Nussbaum| ( |20()2) l pertaining 



the underlying nonparametric setting and a piecewise constant local parametric approximation. InlCurci & 



Corsi ( 201 1) a related estimation strategy using a discrete sine transform approach is considered and tested 



in an application study. 

Ongoing progress in this research area has recently led to estimation approaches for the integrated co- 
volatility in multidimensional models. The above-mentioned methods carry over to a multidimensional 
setting. Rate-optimal estimators, which also cope with asynchronous observations, have been estabhshed 
by|Christensen et a/.|(2010|l,|Ait-Sahalia et dL\^20lO) and'Bibinger|(|201 la[), while [Barndorff-Nielsen et al. 



(201 1 1 focusses on positive-definite (co)volatihty matrix estimators. 

The motivation and contribution of the article at hand is twofold. First, we step forward towards a deeper 
understanding of the statistical properties of covariation estimation from noisy discretely observed diffu- 
sions. In particular, we prove that observing two correlated diffusion processes with noise at synchronous 
times is asymptotically equivalent (in the sense of Le Cam's equivalence of statistical experiments) to ob- 
servations in a related continuous time white noise model. The procedure is completely explicit and thus 
allows to transfer estimators and tests from one model to the other with the same asymptotic properties. In 
particular, for bounded loss functions asymptotic efficiency results are the same in both model sequences. 
The white noise model itself is asymptotically equivalent to a piecewise constructed parametric model. 
That result is an extension of the one-dimensional findings in ReiB (2011 ) and gives rise to our local spec- 
tral approach. The second contribution are our nonparametric spectral estimators of covolatility (SPECV) 
for both, the integrated covolatility (i.e. covariation) and the spot (i.e. instantaneous) covolatility. The 
estimators are based on certain empirical bivariate Fourier coefficients on each block in time which in the 
piecewise parametric white noise model are just independent Gaussian vectors in with volatilities and 
covolatilities appearing in the covariance structures. This very simple structure allows a straight-forward 
analysis and often reduces the estimation variance compared to the previously suggested methods. This is 
corroborated by simulation results which show good finite-sample properties. 

The article is arranged in three upcoming sections and an appendix comprising the technical proofs. Sec- 
tion|2]is devoted to the underlying statistical experiments and the asymptotic equivalence results. In Section 
[3] we develop the SPECV, spectral estimator of covolatility, and investigate its mathematical properties. A 
discussion and simulation study is provided in Section |4] where the SPECV of integrated covolatility is 
compared to concurrent nonparametric approaches. Owing to its local spectral construction principle, the 
new approach outperforms earlier methods if the correlation or volatility processes vary in time. 

2 Asymptotic equivalence of the discrete regression-type and the con- 
tinuous white noise experiment 

Consider the statistical experiment in which a two-dimensional discrete time process Z defined by 

(£o) Zt^=Z^^+e„0<^<nwithZi = Zo+ / Ey^B,, t e [0, 1] 

Jo 



is observed, where B is a two-dimensional standard Brownian motion and 

St = 



the (spot) volatility matrix. The signal part of Z = (X, YY denoted Z ~ (X, Y)^ , which is called 
efficient price process in finance, is independent of the observation noise e — (e^, e^)^. The observation 
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errors (e^) are i. i. d. centred normal with covariance matrix 

V VXY Vy 

We consider time-varying volatility matrices E belonging to a Holder ball of order a G (0, 1] and radius 
R > 0, i.e. E e C"(i?) with 



C"(i?) = {/eC"([0,l],R^><^)|||/||c" <i?} Where ||/||c= :=||/||oo+ sup- _ 

p y\ 

We denote the spectral norm in R^^^ always by || • || and define ||/||oo := sup^^jQ \\f{t)\\- 
In ( |£o| i we allow for a non-equidistant synchronous observation scheme (t")o<i<n, but we will have to im- 
pose that the sampling can be transferred to an equidistant scheme by a quantile transformation independent 
of n. 

Assumption 1. Suppose that there exists a dijferentiable distribution function F : [0, 1] — > [0, 1] with 
-F(O) = 0, F{\) = 1 and F' > 0, such that the observation times in ( |£o| l are generated by tf = 
F-^{i/n), i = 0,...,n. 

Note that we only consider deterministic designs of observation times. Under random sampling schemes 
the estimators should have similar properties, but the mathematical analysis is much harder. 
We use a similar notation for the white noise experiment 

(£i) dZt ^Ztdt + n"'/'H'/' dWt , t e [0, 1] , 

with the covariance matrix H of e, Zt = Zq + J* T,]/^ dBs and a standard two-dimensional Brownian 
motion W independent of B. 

In the following, we shall prove the results for an equidistant setting — i/n,i — 0, . . . ,7i. This is 
founded on the connection between a sampling scheme based on a quantile transformation of the equidistant 
grid and an equidistantly observed process with transformed volatility matrix by the identity in law 



Zf-Hu) = I (Ef)'/^dB,withS]f = 



which follows directly from the identity for covariance functions of these Gaussian processes via Ito isom- 
etry. Hence, upcoming results can be generalized for all F satisfying Assumption[T] replacing everywhere 
Z by Z^, by z/n and E by E^. Yet, the ease in dealing with transformations in the white noise model 
even gives another useful representation for non-equidistant design. Experiment (£i) in terms of observing 
Z^ in noise is equivalent to observing 

dZF(t) = ZtF'{t)dt + n-'lm'/^F'{tf/'^dSNt,t£ [0,1], 

see below for the exact notion of Le Cam equivalence which can be easily verified here by the identity of 
likelihood processes. Dividing by F'{t) yields further equivalence with observing 

dZt^Ztdt+{nF'{t))-'^''Il'^^dWt,te [0,1]. (1) 

As we shall establish next, experiments (£o) and (£i) will be asymptotically equivalent for n — > oo and 
the formulation ([T} has a very intuitive meaning: the local noise level at t is proportional to (nF'(<))^^/^, 
one over the square root of the local sample size nF'{t). 

Definition 1. Let 8,o{n, a, R, E) with n € IN, a € (0, 1], i?, S > Z^e the statistical experiment generated 
by observations from ( |£o| l with = i/n. The unknown parameter E in ( |£o[ ) belongs to the class C°'{R) 
and satisfies Et > T^E2for all t G [0, 1] with the identity matrix E2 £ R^^^, i.e. the smallest eigenvalues 
ofEt ore larger than E. 

Analogously, Zef £ 1 (rt , a , i?, E) be the statistical experiment generated by observing ( |£i| l with the same 
parameter class for E. 
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For the following results, we will briefly recall the notion of asymptotic equivalence, Le Cam deficiency 
and Le Cam distance. We refer interested readers to |Le Cam & Yang] ( |2000) l for more information on the un- 
derlying theory. For statistical experiments £o = {Xo,Jo,{K 1^ G ©}) ™d £i = (Xi, J'l, {P^ 16* e 6}) 
with the same parameter set Q defined on (possibly) different Polish spaces, their Le Cam deficiency is 
defined by 

5(£o,£i) =infsup|lifPg-pi|lTv, 
^ eee 

where the infimum is taken over all Markov kernels (or randomisations) K from (Xq, ?o) to (Xi, 3^i). The 
Le Cam distance is defined by 

A (£o, £i) = max {5 (£i, £o) , S (£o, £i)) . 

If 

lim A{£^,ED = 

n— f oo 

holds for sequences of experiments (£9 )„ and (£" )„, then these sequences are called asymptotically equiv- 
alent. 

The construction of the Markov kernel K will be explicit in all the proofs given in this article in terms of 
data transformations and randomisations. 

Theorem 1. The statistical experwients E,Q{n, a, Rj'S) and Ei{n, a, R,'E) are for any a > and n ^ 00 
asymptotically equivalent. More precisely, the Le Cam distance is of order 

A (£0, £1) = (i?n-("^V2)H-i) ^ (2) 

where H denotes the smallest eigenvalue o/H. 

We explicitly state how asymptotic terms hinge on H, since this is of interest when considering noise 
levels decreasing with n. A concise proof of this theorem is given in the appendix. The strategy of proof 



follows the same principle as for the one-dimensional setting in ReiB (2011 1. For the proof that ( |£o[ ) is 
at least as informative as ( |£i| i, we construct a continuous time observation by linear interpolation. The 
interpolated process Z is a centred Gaussian process on [0, 1]. The associated covariance operator C on 
L? ([0, 1], R^) is such that the difference [C — C), where C is the covariance operator in a white noise 
model comprising the interpolated signal term, is positive (semi-) definite. For this reason observations 
from such a white noise model can be generated by adding an independent Gaussian noise component to 
Z. Now a process Z from this white noise model and Z in ( |£i| l can be defined on the same probability 
space and it suffices to show that the total variation distance of the laws converges uniformly over E to 
zero. This is accomplished by bounding the squared Hellinger distance. For the proof of the intuitive 
converse, that ( |£i[ ) is at least as informative as ( |£o| i, we take means symmetrically around the points 
{i/n), 1 < i < {n — 1) from ( |£i| i and verify that the Hellinger distance between the processes generated 
in this manner and Z from ( |£o| i tends to zero. 

An important setting in which the volatility processes follow again semimartingales is covered by Theorem 
[T]for the case that Z remains conditionally Gaussian. 

Definition 2. Write [tj — lt/h\ hfor h > 0, assume ,nh e and let = Zq + /p sj^^j ^ dBg with 
a two-dimensional standard Brownian motion B. Let belong to C" (i?) and satisfy Tit > Ei?2- Define 
the process 

(£2) dZt = Z'ldt + 7i-'/2hV2 ^Wt, t e [0, 1], 

where W is a standard Brownian motion independent q/B. The statistical model generated by the obser- 
vations from {£,2) is denoted by E2{n, h, a, R, E). 
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In experiment we thus observe a process with a volatility matrix which is constant on each block 
[kh, [k + l)h), k — 0, 1, ... , /i^^ — 1. It is intuitive that for small block sizes h and sufficient Holder 
regularity a this piecewise constant approximation is sufficiently close to render the approximation error 
statistically negligible. This is made precise in the following theorem. 



Theorem 2. Assume h" = o (n ^/*) for 1/2 < a < 1 and S > 0. Then the statistical experiments 
£i(n, a, i?, S) and E,2{n,h,a,R,]y} are asymptotically equivalent: 



A (£i, £2) = (i?/i"S-3/4H-i/4„V4 



(3) 



The asymptotic equivalence results lead to a new approach for the covariation estimation problem. 

( 201 Ij l, we consider an orthonormal system in i^([0, 1] 



ReiB 



Following the idea for the scalar case from 

of specific cosine functions with support on the blocks [kh, {k + l)h] and frequencies of order j > 1. Their 
antiderivatives are sine functions on the same support and will also play a crucial role. We set 



cos 



(jtt/i ^ it- kh))l[kh,(k+i)h]{t), j > 1 ,fc = 0, . . . ,/i 



1. 



(4a) 



2/1 n sin 



(^)) ^^^(■^'^^ ^{t~kh))lykh,{k+i)h]{t)-,j>'^,k^Q,...,h- 



1. (4b) 



Differently from ReiB ( 201 l| l, we appropriately renormalize the antiderivatives (|4b]) to be equipped for the 
discrete analysis. The functions (|4a]) and (|4b]), evaluated on the grid given by the observation times, provide 
spectral weights for local blockwise averages. By virtue of the transformation for general observation 
schemes discussed above, we may for ease of exposition consider the equidistant grid; 



1=1 



Xi 



(5a) 



%fc=i^(rx-fi^)<i>,fc(^). (5b) 

Since "I>j(^-i_i)(l) = 0, the last addend is zero for all blocks k. We stress that by the indicator functions 
in ( |4a] i and (|4b]) and since ^jk{kh) — ^jk{{k + l)h) — 0, the sums in ( |5aj i and ( |5b] i only extend over 
I = k ■ nh + 1, . . . , {k + 1) ■ nh — 1. Therefore, families {S:jk,yjk)j are uncorrected and thus by 
Gaussianity independent for different blocks k. Besides the independence between blocks, we additionally 
benefit from the orthogonality of each family of functions associated with a specific period or frequency. 
The orthogonality relations / fjkfik = and j ^jk^ik = Vi 7^ j in i^([0, 1]) will remain valid for the 
discretized versions and the corresponding sums when i,j E {1, • . • ,nh}. For the purpose of explicitly 
analyzing the discrete terms, we introduce the notion of empirical scalar products: 

(/,5>n := - E / (-) 3 (-) and \\f\\l := -j^ P (") - (/, /)« , (6a) 
n ^-^ \nJ \nJ n ^-^ \nj 



[/^ - i E / ) ) ^ /, 5 : [0, 1] ^ R . 

By abuse of notation for a vector Z ~ {Zi , . . . , Z„) and / : [0, 1] R, we will also write 

{Z, /)„ - E / (-) and [Z, /]„ :^ - E / (—] ■ 
n ^-^ \nj n ^-^ \ n I 
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For two vectors Z and Z it is convenient to introduce the notation 

l\ 1 '"^ 



Z,Z)nfi.k - ^ ^i^;l[fe/i,(fc + l)h] [-) — - ^ ^fenh+i^fenh+j ■ (6d) 



n — ' \n/ n 

1=1 i=0 

The following identity is a main ingredient in the construction of the estimator and for its error analysis 
below. 

Proposition 1. For the blockwise weighted sums xj^, yj^ ,j € {1, . . . , nh}, k €E {0, . . . , — 1}, the 
following summation by parts formula holds true: 

i,. = i;iy,*,.(A).-i;n.,.(i±i)l (7, 

and for Xjk analogously, where A denotes the backward difference operator AYj Yi_ — Yi-i and 
AY — (AYi, . . . , AY„^. Moreover, we have the following orthogonality identities: 

[¥'ifc,<^rfc]„ = , j,r e {1, . . .,nh},k = 0, . . . , /i"^ - 1 , (8a) 

(*jfc,*rfe)n = \\<^>jk\\lSjr, j, r G {1, . . . , n/l} , fc = 0, . . . , /l"^ - 1 , (8b) 

where Sjr is Kronecker's delta. The empirical norm 

W^.kWl = (4n2 sin2 (j7r/(2n/i)))"' , G {0, . . . , /j-^ - 1}, (9) 
does not depend on the block k and appears in our estimator in the next section. 

The two representations of the blockwise sums in (|7]l are very useful when disentangling the estimation 
error emerging from the two independent error sources: discretization and observation noise. In particular, 
we use the left-hand side which involves the increments of the processes only when considering the signal 
parts X and Y . For the analysis of cross terms and the pure noise parts the right-hand side of (|7]i permits 
a significant simplification. In the next section, we use these ideas and the insight into the structure of 
the estimation problem to construct a new estimation approach for the quadratic covariation and the spot 
covolatility of diffusion processes based on the original model ( |£o| i. The final estimator for the quadratic 
covariation appears as a linear combination of the products of the local spectral averages Xjkjjjk over all 
j and k combined with a bias correction. We will benefit from the asymptotic equivalence results for the 
mathematical analysis of our estimator by the following conclusion that we can straiten the analysis to the 
statistical experiment 



(£3) 



Z{'„ = Z';,^ + e„0<i<n with = Zq + y s[/j dB^, t G [0, 1] , 



where we have noisy discrete observations with the volatility matrix being constant on blocks. 

Proposition 2. For nh G IN, a,i? > and S > the statistical experiments 8.2{n,h,a, R,jy) <^nd 
£3(71, /i, a, i?, E) with tf = i/n are asymptotically equivalent: 

A(£2,£3)-0(i?H-in-V^) . (10) 



Consequently, observing Z in ( |£o| i is asymptotically equivalent to observations of Z^ from ((fiajl. Note 
that for constant Y^kh on each block, the have the same structure as the eigenvectors of the covariance 
matrix associated with the vector of the {nh — 1) observed increments on the block. The local weighted 
sums (|5a]i and ( |5bj i on each block hence constitute the corresponding Karhunen-Loeve expansion. We refer 
tO|Bibingerj(|2011a) and for the one-dimensional case to Gloter & Jacod ( 2001| l and Curci & Corsi ( 2011| l 



for the expUcit computation of the eigenvalues and eigenvectors. 
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3 Local spectral estimation of covolatility 

In the sequel, we always assume h" = o (n^^/*). Assumption [I] on the sampling scheme and that the 
volatility matrix belongs to C" (i?) for some a > 1/2 and R > and is bounded from below by a positive 
constant. By virtue of Proposition |2j we can then work within the simpler model ([£3]). We present all 
results for the equidistant design = i/n, noting again that the general case follows by substituting Z by 
Z^, S by etc. Interestingly, integrated volatility is even invariant under this transformation; 

1 rl 



f Y^ldu^ f T.p-i(^){F-^)'{u)du= [ T.tdt. 
Jo Jo Jo 



For estimation purposes this means that we can just neglect the design in the implementation. The invari- 
ance property, however, does not hold for powers of E or for polynomials in , ct^' of degree different 
from two such that the asymptotic variance will significantly depend on the design function F. 
On each of the independent blocks, we have observations dSal and dSbli with 



{x,,,y,,) N(^0,^ ^^^/^^ii^^^ii^^^^^^x^^^^ ^2^/^+ 11^^.^1,2 (^y)2 

independently for all j, k, what can be proved using Proposition [T] We will postpone a detailed computa- 
tion of estimation errors to the Appendix |B] For each j, k fixed, the empirical covariance yields a natural 
estimator of the spot covolatility Pkh<^kh^h ^'^ ^Sich block provided we correct the bias by subtracting 

Vxy/n. 

Remark 1. In the following we assume for the ease of exposition that rjxY is known. Yet we can estimate 
rjxY from the observations with faster rate ^Jn by 

1 

' i=\ 



r?XY = (^1 (11) 



or as well by —n~^ "^lO^i/n ~ Y(^i-i)/n)iX(i+i)/n ~ ^i/n)- For the first estimator \/n-consistency and a 
central limit theorem can be proved in the spirit of \Zhang et al.\^005\ for its one-dimensional counterpart 
l/(2n) ^((Xjy'n — X(;_x)/n)^- The second estimator and its one-dimensional analogue —n~^ — 
^(i-i)/n)(-X^(;+i)/Ti ^ ^i/n) have a slightly bigger variance but tlie benefit of no finite sample bias due to 
the quadratic (co-)variation of the signal part. 

By using just the lowest frequency j = 1 in each block, we obtain a simple rate-optimal estimator of 
integrated covolatility when summing over all blocks [kh, [k + l)h] multiplied by the block length h: 

{SPECV.j = l) ^-^^ 

IC ^hY^W-^^kWniiikVik-riXY/n). (12) 

fe=0 

By independence between the blocks, its variance is of order 0{h^^{rj\ jn + /i^)(»yy jn + /i^)). For fixed 
noise levels rixTriYTflxv the rate-optimal choice h ^ nT^/'^ thus yields a variance of order 0(n^^/^) 
(note that for a > 1/2 and h ^ ■nT^I'^ the condition /i" — oipT^I^) always holds). 

It is possible to obtain a pointwise estimator of the spot covolatility SCVt :— ptcr^crY by the average 
of the spectral estimators over a set 'Xt of K adjacent blocks containing t: 

SCVt - K-^ Il^i'^lln' i^^kVik - VxY/n) . (13) 

Since the observation times in Xt have at most distance Kh to t, the approximation error bound for the 
a-Holder continuous function E yields a squared bias of order 0{{Kh)'^"). The variance is 0{K~^) 
for h > n^^/^, and we obtain for the rate-optimal choices h ^ K ~ ^ root mean 
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squared error of order 0(71 "/(4a+2)^ Standard nonparametric techniques based on Gaussian measure 
concentration then even give the same rate times a log-factor in n for uniform loss in t, i.e. 



E 



sup \SCVt-SCVt\ 
te[o,i] 



0((n/logn)~"/(4"+2) 



For estimation of the integrated covolatility we are not content with rate-optimality, but we also want to 
minimize the asymptotic variance. By independence we gain in efficiency by using on each block a convex 
combination of the estimators over all frequencies j. In order to estimate the integrated covolatility, we 
then just sum these estimators over all blocks. We end up with the following spectral estimation approach 
with local weights wjk, satisfying Wjk — 1: 



IC 



(SPECV) 



h^^-l nh 
k=0 j=l 



(14) 



The optimal weights (minimizing the variance) depend on the unknown spot volatility matrix. As will be 
shown in the proof of Theoremp] they are given by yj°V°-'^'-'^ = Wj{Y.kh) with 



(15) 



Enh ( 11$^ 
r=l\ 



{vWy +V%y) + (1 + p2)(<T^^^)2 + + iay)^rjl+2p<j^ayrjxY)) " 



They give rise to the oracle version of our spectral estimator of covolatiUty (SPECV) 

^(SPECV) ''^^ ^ _„ 

ICoracle,n = 2^ ^ 2^ (Sfeft) |1 |1„ {Xjkjjjk - VXY /n) 
k=0 J=l 



(16) 



Using adequate consistent pilot estimates, we obtain a feasible estimator which is asymptotically as 
efficient as the oracle estimator Besides ( [T3| l we need the corresponding estimators for the spot volatilities 

(^f)^(-r)^: 



keXt 



which also satisfy 



E 



sup 

te[o 



for h ^ ■nT^I'^, K ^ 77,"/(2"+i) in particular, all estimators are uniformly (in t) consistent provided 
the sample size tends to zero. By using just a negligible fraction of the data with sample size m„ — > oo, 
TO„ — o{n), we dispose of a uniformly consistent estimator I]t,„ of which is independent from the 
SPECV estimator when the latter is based on the remaining n ~ irin = n{l — o(l)) observations. This 
gives a concrete construction for the pilot estimator used in the following main theorem. 



Theorem 3. We observe from model ( (5o] l with S e C"{R),R > 0, a > 1/2 and S > 0. Choose 
h ^ n^^/^ log (n). The resulting adaptive spectral estimator of covolatility (SPECV) for the integrated 
covolatility is 



IC. 



(SPECV) 



- 1 nh 



'^Y^^ (£feh,n) ||$jfc||„^ {ijkVjk - VxY/n) 

k=0 j = l 



(18) 
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with a pilot estimator St „ of the spot covolatility matrix Et inserted into the oracle weight formula ( |15| l. 
IfSt^n is uniformly in t G [0, 1] consistent and independent of the data used in {xjk, yjk)jk, then both the 
adaptive and the oracle SPECV estimator satisfy the same central limit theorem: 

n'' {jctZer - PtcrfaY dt^ - N (^0, {rjWv + vlvf'^ , (19) 

^ Pt<jfaY dt^ ^ N (^0, {^Wy + ri'xyf'^ ^ 0, d^ , (20) 



M. I (SPECV) 

n'' ic: 



with 



x>t = ^2(A2 - Bt)Bt UAt + ^Al-Bt- sgn (A^ - Bt) ^ ~ ^ A\ - Bt ) (21) 



and At = + Vxy [Vy (^f ) + Vx (^D + ^Pf^^" Vxy) and Bt ^ A (af ) (1 + pi). 

The independence of the pilot estimator from the data used in the main estimator is assumed for techni- 
cal reasons. It is believed that the result continues to hold without this assumption, which is also confirmed 
in simulations. In the Appendix [B] we learn that high spectral frequencies have decreasing weights and ex- 
ceeding some threshold will asymptotically not contribute to the estimation. For practicable and tractable 
application of the SPECV it suffices to sum up frequencies in ([T8| only up to a spectral cut-off ,]„ ^ nh. 
We refer to |Reil3| ( |2011| l for more information on the cut-off. 

We give a complete overview on the estimation of the (co)volatility matrix here by recalling the accord- 
ing univariate estimator for the integrated volatilities: 

^(SPEV) ^''-^ 

IV n ^ ^Y. ^^kh) W'^idn^ - Vlln) , (22) 

fc=0 j = l 

which we call SPEV, with the oracle weights 



Er=i(ll*^/c||n'(^i/n) + (a^)2)-' 



and analogously for Y. 

In general, the noise levels riXj^Y, Vxy are unknown, but they can be estimated with faster rate y/n 
as mentioned above. A result with preestimated error covariance matrix H can be derived as for the 
preestimated Y.kh above. Furthermore, it is of high practical interest to study how our covolatility estimator 
behaves under vanishing micros tructure noise level, i.e. in the case H = 0. In that case the oracle weights 
are all equal wjt = l/{nh) and on each block we estimate the block covolatility by the sum 



nh 

3 = 1 



of discrete Fourier coefficients with respect to {^jk)i<j<nh- By Parseval identity this sum is equal to 
n{AX, AY)nh;k- In conclusion, in the case H = and for oracle weights our SPECV estimator reduces 
to the realized covolatility, which is the natural estimator in this situation. 

Let us finally mention that the pilot estimators (Vf) and the estimator (|22]) slightly differ from the one 
in ReiB (2011 1 because we use the accurate ^jk for the discrete setup and their empirical norms defined 
above. 



4 Discussion and simulations 

As mentioned before, previously proposed nonparametric approaches have in common that they are quad- 
ratic forms of the observation vectors and when choosing corresponding weights or weight functions trans- 
late into each other and show accordant asymptotic properties. Nevertheless, each method is motivated 
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from a slightly different point of view. The first two-scales realized volatility (TSRV) approach by [Zhang] 



et al. (2005 ) for the integrated volatility has been grounded on a subsampling method and a bias correction. 



Disregarding the bias correction, the subsampling estimator is the mean of lower frequent and hence less 
noise-sensitive realized volatilities. Zhang (2006) has extended this procedure to a hnear combination us- 
ing different time-scales (MSRV). The kernel approach by |Bamdorff-Nielsen et a/.| ( |2008| l can be viewed as 
a linear combination of empirical autocovariances. Finally, the pre-average principle by |Jacod et a/.| ( [2009] l 
pursuant to its name incorporates (pre-) averaged weighted observations on blocks. The latter is closest to 
our methodology, but using Haar functions instead of ( |4b] i. 

For all three the trade-off between the error due to noise and discretization is handled by choosing a global 
tuning parameter c^/n, where c is a constant, minimizing the MSB to order Thus, the optimal 

convergence rate is attained. If we neglect in support of these methods the possible asymptotic influence 
of end effects, they have an asymptotic variance structure 91c~^ + Dc + €c~^, where the signal part D 
depends in our notation on S, the noise part on H and the cross term £ on both. Minimization leads to 



1/2 



€ + + 12711)) /6Dl) . The oracle solution is proportional to ij ^ for equal noise vari- 



ances 7]^ of X and Y. Interestingly, Barndorff-Nielsen et al. ( 2008 1 have succeeded in the univariate case 



with constant volatility in approximately attaining the lower bound from |Gloter & Jacod| ( |2001 ) by a clever 
selection method for their bandwidth and weights and also a feasible version with Tukey-Hanning kernels 
comes very close to that bound. Essentially, the main difference to our proposed approach is that we do 
not need to fix a tuning parameter and weights globally - but are able to adapt weights locally dependent 
on the observations only on each particular block. 

We content ourselves with the findings in an idealized statistical model which gives insight into the fun- 
damental structure of the estimation problem. Note, that an i. i. d. assumption on the noise and Holder- 
continuity conditions on the volatility processes are customary in the strand of literature on nonparametric 
estimation methods. In our opinion, it is convenient to look at methods derived from a simple model and 
inspect the effect of misspecification on them. In the microstructure noise setup, we might first think of 
a diffusion with constant parameters. |Xiu| ( |2010| l has taken a path in this vein with reviving the classical 
MLE in this framework and proving its robustness to a typical nonparametric setup. A local parametric 
approach is more flexible and increases in general the performance. More surprising than the accordance 
of asymptotic properties for the aforementioned three nonparametric methods, is that Xiu ( 2010[ l reports 
that the Quasi-MLE approach is in this sense asymptotically equivalent to the kernel approach as well. 
This is not the case for our SPEV/SPECV approach what underlines the originality of our local spectral 
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Normal Q-Q Plot 




Theoretical Quantiles 



Figure 2: Boxplots for constant (left) and time varying (right) spot correlation and volatilities and normal 
QQ-Plots for MSRC and SPECV estimates in the time varying setting. 



estimation method. Extensions of the theory that investigate the properties of the SPEV/SPECV in more 
general models, e.g. incorporating stochastic volatility and non-Gaussian errors, remain an open task for 
further research. For the moment the simple structure of SPEV/SPECV makes us confident that it can be 
robust to much more general model specifications. 

Let us give concrete examples to compare the asymptotic variances of our SPECV and the other meth- 
ods. For the simple parametric setting with constant ~ 2 and <t^ — 1 and rj :— rjx = Vy = 1, 
in Figure [T| we depict the asymptotic variances for p e (-1, 1) of the SPECV from Theorem |3] of the 



multi-scale realized covariance as deduced in Bibinger| (|201 lb I, of the pre-average estimator as given in 



|Christensen et al.\\10\0\ and of the QMLE from Ait-Sahalia et al.\\20\0\ , all with an optimal oracle tuning 



parameter selected as described above. The asymptotic variances are proportional to 77, so that Figure [T] 
rescaled by -q is meaningful for arbitrary noise levels. The SPECV has the smallest asymptotic variance 
and the QMLE the largest in this particular setup, all to the same optimal rate of convergence. The kernel 
method according to |Barndorff-Nielsen et al. ( 2011| l is not included, since the multivariate version has a 



non-optimal n^/^-rate by oversmoothing to the benefit of positive semi-definiteness. We stress that we in- 
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tentionally have picked unequal constant volatilities here to the disadvantage of the QMLE which relies on 
the polarization identity. For equal volatilities = cr^ in our model X + Y and X — y are independent 
and the polarized QMLE (also a polarized SPEV concurrent to the SPECV) will not suffer a disadvantage 



by polarization. From the comparison to the Fisher information in Bibinger (201 la I, we can learn that the 
QMLE and the SPECV exhibit asymptotic efficiency for p = in this setting. The approach presented 
in |Barndorff-Nielsen et al.\ ( p008 ) to derive asymptotic efficiency in the one-dimensional scalar case can 
easily be extended to a bivariate synchronous setting and renders a rate-optimal approach, asymptotically 
efficient for p = as well. Yet none of these estimators is asymptotically efficient on the whole parameter 
space (p, CT"^, cr^) e ((0, 1), 1R+, R+) and it is beyond the scope of this article to finish the quest for a 
globally asymptotically efficient estimator that outperforms the concurrent methods in each case. 
Here we aim at providing with SPECV a method that performs well for time varying functions by lo- 
cal adaptivity and thus focus on that setting in the following. For this purpose we compare asymptotic 
variances in the same spirit for p e (—1,1) and 

= 0.1 - 0.08 • sin (7rt), t e [0, 1] , 
aj = 0.15 - 0.07 • sin ((6/7) • vrt), t e [0, 1] , 

which will as well be considered for the simulation part below. We add the theoretical asymptotic vari- 



ance of a simple extension of the optimal kernel estimator for integrated volatility from Barndorff-Nielsen 



et a/.|(!2008 ), which can be approximated by Tukey-Hanning kernels. This approach features the smallest 



asymptotic variance in a wide domain of p among the compared non-locally adaptive methods. Even so, 
the SPECV clearly comes below this benchmark. The right display of Figure [T] shows that the gains of 
SPECV compared to the previously proposed methods are much more distinctively than in the scalar case. 
After this theoretical comparison and the conclusion that the SPECV is preferable, especially in the general 
nonparametric setting, we shed light on the finite sample size behaviour of our approach in a Monte Carlo 
study. 

In the first simulation, we compare the SPECV with the multiscale realized covariance (MSRC), both with 
an oracle choice of weights and tuning parameter, respectively. First, we implement a simple parametric 
model with n ~ 30000 equidistant observations of X and Y, where = = 1, p = 1/2 and noise 
levels rjx ^ TjY = 0.1. The implemented MSRC as given in |Bibinger| ( |201 la) is for synchronous observa- 
tions a direct extension of the MSRV by |Zhang| ( |2006j and translates asymptotically to the kernel estimator 
with a cubic kernel. It is known to have a good finite sample size behavior. We implement the SPECV with 
an adequate heuristic choice h = 1/30 such that nh = 1000. 

The empirical distribution of the estimates from 10 000 MC iterations are visualized in a boxplot in Figure 
[2] The SPECV estimates have an empirical variance of 0.49 • and the MSRC of 0.71 • ^/n. The empirical 
finding is that in this setting the SPECV is closer to its theoretical asymptotic variance of about 0.46 than 
the MSRC to its theoretical value of 0.52. 

Our main focus will be the non-scalar case. For an example of deterministic time-varying functions, set 

= 0.1 - 0.08 • sin {irt), t e [0, 1] , 
aj = 0.15 - 0.07 • sin ((6/7) • nt), t e [0, 1] , 
p^ = 0.5 + 0.01 • sin (vrt), t e [0, 1] , 

where the volatilities are higher at the beginning and end of the observed interval and the correlation is only 
slowly varying, which mimics the basic realistic features. We keep the noise levels r]x = Vy =0.1 fixed 
and rather high compared to the signal part. The known integrated covolatility equals 0.00269 here. Since 
the noise level is high and dominates the signal part, the frequencies chosen according to the above given 
selection rule for the MSRC estimator become large (over 1 000) and the computing time increases for these 
kind of nonparametric estimators. As can be seen in the right boxplot of Figure|2] the SPECV outperforms 
the MSRC for non-constant volatilities and correlation more clearly. This confirms that the spectral local 
technique is more adequate to capture the effect of time-varying volatilities by local adaptation, not only 
theoretically but significantly in the finite sample case. The QQ-Plots in Figure [2] inspect the normal 
approximation for the two estimators from this Monte Carlo study in the time varying case. 
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Estimator 



(SPEV) ^1 , „,2 

IVI ' for {aff dt 

^(SPEV) 1 2 

IVn for Jo (c^t ) dt 



(SPECV) ^ ^ 

ICI 'for Si p.a^a^dt 



(MSRC) 1 

-^C'oracie ^Or J ^ pt<Tf (jf dt 



-^C'oracZe.n ^Or j ^ pt<jf af dt 



RMSE 



0.0072 
0.0086 
0.0034 
0.0035 
0.0015 



MSRC(oracle) SPECV{oracle) SPECV(adaptive) 



Table 3: Comparison of root mean squared errors of MC (co-)volatility estimates. 
Figure 4: Boxplot of 10000 MC iterations of the oracle MSRC and oracle/adaptive SPECV. 

We conclude the simulation study with an implementation of the adaptive SPEV/SPECV. We use pilot 
estimators ( [T3| l and ( [T7| ) for S at times I ■ nh, I = 0, . . . ,30, with K — 30 adjacent blocks. 
The 10 000 MC estimates of the adaptive SPECV are illustrated in Figure]?] Table [3] summarizes the root 
mean squared errors of all three adaptive SPEV/SPECV estimators and the oracle SPECV and the oracle 
MSRC. The performance of the adaptive version of SPECV can not keep up with the oracle version, but 
in our simulation it is still slightly better than the oracle MSRC. For an adaptive MSRC the root mean 
squared error will clearly become larger and we refer to [Bibinger] ( |20 lib) for the method and simulation 
results pertaining this point. 
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A Appendix: Proofs of asymptotic equivalence 
Proof of Theorem [l] 

We start with the constructive proof that ( £9 i is at least as informative as ( £1 1. We use the linear B-splines 



= i±i](t)min(^l + n(^< j , 1 - n - 

i. e. supp bi = [{i — l)/n,{i + l)/n] , bi{i/n) = 1, and hi linear on [{i — l)/n,i/n] and [i/n, [i + 
Consider the centred Gaussian process Z defined by 

n n n 

Zt = ^Z,6,(i) =^Z^6,(i) + ^£,6,(t). (23) 

i=l i=l i=l 

The covariance function of Z is 
with 

For any f = (/x, /y)^ e ([0, 1], R^), we have 

> 21 



j and H = 













E[(f,Z)2] =E[((/x,l) + (/y,r 
= e[(/x,X)2] +E[{fY,Yf] +2E [if x,X}{fY,Y) 

n / ■ A ■ \ " 



— 1 i—1 

ij = l 4=1 

n / ■ A ■ \ " 

+ 2 ^12 (^j (/x,6,)(/y,6,)+2^77xy(/x,6.)(/y,6,). 

ij' — 1 i— 1 

The sum of the three terms induced by the observation noise is bounded from above by 
?72.|l/y||2 + 2 77xy(/x,/y», since 

n n n 

i—1 i—1 i—1 

2 



\2 2rixriY ^ V2 2rixTiY / 



2rix'nY^ ' ' V2 2rixriY' 

{VxWfxW' + VY\\fYf + 2vxY{fxjY)) 



n 



For the upper bound we have used that nbi{t) dt = 1 implies {fx,nbi)'^ < {fx,nbi) by Jensen's 
inequality and 6i < 1 and analogously for the other terms. Now observe that E[(/, HrfW)] — 
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E[Jf7UdWt]^{vjc\\f- 



x\ 



vUfy 



2llXY{fxjY)). 



As a consequence, observations from Z defined by 

" 1 
dZ = y^Z^bJt) dt + ^H'/" dWt 



(24) 



with a two-dimensional standard Brownian motion W can be generated from ( |£o| i by adding additional 
N (^0, (7 — C^noise, where C* : is the covariance operator of Z and the covariance operator 

C : L'^ I? associated with ( p4| i is given by 



cm = ^ A 

Let C be the covariance operator 

cm = 



« A j 



(/, + n-iRf (i) , f e L2 ([0, 1], R2 



tAu 



A(s) ds ) f (w) 



-iHf(t) 



from ( |£i| l. In the following H denotes the smallest eigenvalue of H as in Section [2] In the extension of 
the findings for the one-dimensional case, which has been treated in Section A.2 in |Rei6 ( 2011) l, we make 
use of the convenient upper bound for the squared Hellinger distance between two normal measures by the 
squared Hilbert-Schmidt norm denoted || • ||hs- For a concise introduction on Helhnger distances between 
Gaussian measures and the Hilbert-Schmidt norm we refer to Section A.l in |ReiB|f201 1 1). 
The asymptotic equivalence of observing Z and Z in ( |£i| i is ensured by the Hellinger distance bound 



(£ (Z) , £ (Z)) < 2 \\C-'/'' {C ~ C) C-'/-'\ 

A(t As 



2 

HS 



iAj 



b,{t)b,is] 



dt ds 



(H"2i?2„-(2aAi) j ^ ^^^-^ for a > . 



Note that we have estimated the L^-distance between A{t A s) and its coordinate-wise linear interpolation 
by 0(71^^^") using a standard approximation result based on the fact that the function {t, s) 1— ?> A(i A s) 
lies in the class C^+" away from the diagonal {t = s} due to A'{t) — 'Et G C" and is Lipschitz at 
the diagonal (on the n — 1 squares [{i — l)/n, {i + l)/n] the pointwise bound 0(n^^) only contributes 
(n — l)0{n^^) — 0{n'^) to the squared L^-distance). 

The proof that ( |£i[ ) is at least as informative as ( |£o| i is obtained by a similar estimate and a generaliza- 
tion of the construction technique from the one-dimensional setting. For this purpose, set 



Z' 



(2i+l)/2n 



dZ, = 



(2i 



-l)/2n 



(2i+l)/2n 
(2i-l)/2n 

dZt 



Ztdt + El, 1 < i < {n^ 
Ztdt- 



1), 



2n 



(2n-l)/2n 



(2ri-l)/2n 



with 



The estimate that 



(2i+l)/2n 



(2i-l)/2n 



H'/'dWt - N(0,H) 



(l (Z'i,...,Z;) ,£(Zi 



Ihs 



< 2H"^||C" - C\ 



2 

HS 



('H"2i?2n-2a^ 



^)) <2\\C-'^' [C -C) C-'/'Wl 

') 

establishes the result. Altogether, the Le Cam distance between the experiments ( |£o| l and ( |£i[ ) is of order 
(iJr^Rn~°'y Assuming that A is (1 + Q:)-H61der continuous (a-Holder regularity of the covolatility 
and volatilities), the asymptotic equivalence of the statistical experiments with discretely observed noisy 
diffusions and the continuous time white noise model is deduced. □ 
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Proof of Theorem g 

The proof affiliates to the one-dimensional result and its proof in Section A.3 of ReiB (201 l|l. It is shown 



that the Hilbert-Schmidt norm of the difference between the experiment ( |£i| l and the one where E is 
evaluated at times [tj h min {k h\k h < t} , 1 < k < h^^ — 1 tends to zero. 
In the two-dimensional setting, we have a Holder bound 

||St-SLtjJ|oo<i?/i",i€ [0,1]. 

Denote Cs the covariance operator associated with the experiment ( |£i| i with volatility matrix E. For 
/ e ([0, 1],1R2) let F : [0, 1] the corresponding antiderivative with F{1) (0,0)^. The 

difference of the two covariance operators of experiments with E and E'' where Ej' := E^tj^, respectively, 
pertains only the signal part: 



((Cs-CsO/,/) = /'Fr(Et-E[')i^tdt< II E-E'' II oo (£/,/) 
Jo 

by partial integration, where € denotes the covariance operator of a standard two-dimensional Brownian 
motion. We end up with the following upper bound for the Hilbert-Schmidt norm: 



||C^'/^(Cs. - C^)C^'^'\\m < lis - S'^lUr^'/' CC^'^^HS 



< IIE-E" 



I OO 



I^CE + Hn-Md) £ (£E + Hn-i id) ' 



HS 



<i?/i"||G((!:)||HS. 



The function G{z) = z (zE -|- Hn^^) ^ is applied to £ employing functional calculus. 
The operator £ has the same spectral values as the covariance operator of a one-dimensional standard 
Brownian motion with double multiplicity. Hence, the result is derived directly from the spectral analysis 
for the one-dimensional case in |ReiB| ( |201 \\ . □ 



A. 1 Proof of Proposition |2] 



The proof follows exactly along the lines of proof for Theorem [T] The only difference is the bound on the 
L^([0, l]^)-distance between the functions A(i A s) and A((« A j)/n)hi(t)b-j{s). Since E is block- 
wise constant, A is linear on each interval [{i — l)/n. i/n]. By the linear interpolation property the two 
functions coincide on each square [(i — l)/n,i/n] x [{j — l)/n, j/rt] for i ^ j. For the n squares where 
i = j, the Lipschitz property of {t, s) !-> A(i A s) yields a total L^-distance of order n^''/^ (cf. again proof 
of Theorem[T]i and the bound on the Le Cam distance follows. 

B Appendix: Asymptotics of the local spectral (co-)volatility estima- 
tor 

We start with the following standard formula for a bivariate normal distribution which will be used implic- 
itly several times. 

Lemma 1. For a Gaussian random vector 

Y J V V pcrx<JY cry 

it holds true that 

Var(X2r2) = (1 + p^)(T\a'^ and Var(X2) ^ 2a\, Var(r2) ^ . (25) 
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Proof. The nature of the Gaussian distribution allows us to write Y = p(cry /crx)^ + -\/l — p^ayZ where 
Z - N(0, 1) is independent of X. Since E [X^] = Scr^ and 

E [X2y2] = E [x^E [Y^\X^]\ = E [X^ {p\al/al)X^ + (1 - p^)^! [Z^])] 

= [^1 + (1 - P')E [^'] 4 = (1 + 2p2)a|4 , 

we directly conclude the statement of the Lemma. □ 

The elementary identities ( |25] l and (|7| are central tools in the error analysis for the SPECV-estimator. 
The latter is proved in the following. 

Proof of Proposition [l] 

Equation (|7]i is basically an application of the discrete summation by parts analogue to the integration by 
parts formula, also called Abel transformation. 

The elementary identity sin [x + h) — smx = 2 cos [x + h/2) sin {h/2) yields: 



1=1 

n-l 

/ / / -I- I \ 



i=0 
n-l 



1 



The boundary terms vanish due to $jA:(0) = $jfe(l) = 0. Further simple relations for trigonometric 
functions reveal the orthogonality properties (|8a]l and (|8b]l. Without loss of generality, consider the first 
block fc = 0: 



ih-l 



[(pjOi '/'ro]„ = ^ E] j^cos {^jnh^'^ n^^ {I + 1/2)) COS (rTih^'^ rT^ {I + 1/2)^ 

1=0 
n/i — 1 

= - E /i-^ (cos {{j + r)TTh-^n-\l + 1/2)) + cos {{j - r)Trh-^n-\l + 1/2))) 



The last equality holds since for arbitrary to e IN: 

fniTT f, 1\\ ^"^^ . / /2/ + 1TO 1\\ ^"^^ . / fl 2l + lm 

1=0 1=0 1=0 

1=0 1=0 



Analogously we deduce that 



1 .^^v sin (jV/i ^/)sin(r7rft, ^Z) 



2/in2sin^ sin(^) 



1 cos ((j — r)7r/i ^tj ~ cos (^{j + r)nh ^tj 
4/.n2sin(5^)sin(^) 

Sjr {^n" sin^ (j7r/(2n/i)))"' - hrW'^Al ■ 
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We conclude that the famihes of functions {(pjk), i^jk) are orthogonal systems with respect to [•, ■]^^ and 
(•,•)„, respectively. □ 



Proof of Theorem |3] 



Though we have exploited the well-known distribution characteristics for blockwise averages {xjk,yjk) 
directly in our simple Gaussian model in order to motivate the spectral estimator, for a better transparency 
and clarity, we give a detailed analysis for the asymptotic expectation and variance here. By Proposition |2] 
we can equivalently work with model (£3), where the observations are generated by a blockwise constant 
spot volatility S ^jj . 

-^{SPECV) 

Consider at first IC^^^^^i^ „ with known spot volatility matrix and correlation. We drop the superscript 
and subscripts in the following. The estimator is (asymptotically in ( |£o[ )) unbiased since 

h^^-l nh 
h^^-l nh 



fc=0 j = i 

= hpkh(rkh^lh^ / Ptcr^al dt + o{l), 

fc=o -^0 

if Parseval identity, Ito isometry, the orthogonalit} 
The variance calculation is simplified by the independent block structure 

h-^-l / nh \ 

Var(/C)= J2 /^'Var K]||<l>,fc||-2^«,fei,fey,J 

k=0 \j=l ) 



in view of Parseval identity, Ito isometry, the orthogonality relations ( [8a| i and ( |8b| ) and ^"^1 Wjk — 1. 



Consider the variance on the fcth block. By the orthogonality relations ( [8a| i and ([Sbji of the (jCj^'s and $jfe's 
and application of (p5]l to T^kh and H, the evaluation of the variance on the block yields 



(nh \ nh 



+ E ll<I'.-fcll;:'«^'fcVar((AX,n$,fc)„(Ay,n$,,)rO 

nh 
■nh 

+ 2 ^ II ||,;4^„2^E [( [e^, ^,,] „ ^,k\ „ (AX, n$,fc)„ (AF, n'^,k)n)\ 

'^^ / 2 2 j_ 2 



2 2 

[(^,fe,^,,]„E[(nAF,AF)„,;fe] \\^,kfn + ^ M„ E [(nAX, AX)„„fe] 



+2^ [^,fc,^,fe]„E[(nAX, Ar)„;,,fe] 
\ n 
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Wjk yW^jkWn ^ (1 + Pkh)[<^kh<^kh) 

' i^IhVX? + {^khVY? + '^PkhCr^h'^lhVXY 



We have used Ito isometry, the features of model ([Saji and Proposition [T| To increase the readability we 
introduce the shortcut Ijk- Selecting appropriate weights with '^jk = 1 on the blocks gives rise to 

an optimization problem with side condition. Minimizing the asymptotic variance yields oracle weights 

w,k^-!;j^, (26) 
1^1=1 nk 



where 



( ll<T. 11-4 ''^X^Y + ^XY , /T , „2 \(„X Y \2 
l^ll*jfe|ln —2 ^ PkhMkh<^kh) 

{^IhVX? + {(^khVYf + ^PkhCr^h'^lhVXY 



+ \\'^,k\\-' 

Plugging in these weights the asymptotic variance on the fcth block becomes J^j IJk{I%l{Y.i ^ikf) 
(X); ^ik)^^- Next, consider 

, nh nh , / • \ / ' \ \ -1 

Jnh^ ^'^ J^h^K \2nhJ \2nhJ J ' 



with the shortcuts a = (1 + b = 16(77! 77?, + ?7|,y) and c = 4((7;x^tL)' + {VY<y^h) 

X Y ■ 
kh"kh 



+'2riXYPkh<^kh'^kh)- < « < 3/8, we obtain the bound 



nh 1 / , zt 20/o^4r»7/l 9 10/oi9„,9\ ^ 



^ ^ _J_ 

AT/, ^J'^ - ,A7, 



■n/i I a 



5 2 7r4n'"/«+4"/i4 c ^2„io/8+2a;j2 

-77- — 7T-; h -n- 



where we use that sin x > x/2 on (0, 1), and further that by Taylor expansion 

71 h 



Jnh ^ ^ Jnh ^ V \\^n^h'^ ^ '} \4n^h^ ^ 

= ;7^ E {a+{b/W)7THj/y^h)^ + {c/4)7T\j/V^hrr' + o{l). 

We have thus shown that uniformly for all k, the high frequencies j > 77^/** do not contribute to the variance 
due to their decreasing weights and thus the sine functions may be approximated by the first order Taylor 
expansion. The overall variance is with ho /i^/n {ri^-qy + ^xy) 

VAR„ = 'y ' {r^WY + v'xY^' f E hk] 



and hence, y/n {rj^'qY + '7xy) VAR„ and 77 ^/^/j 1 /^^ have the structure of Riemann sums. Be- 
cause of ft-o — > 00 we can replace j /hi^ by an integration variable z and we are going to prove that 



lim 77 

n— >-oo \ 
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with 

/.W^JUE3;-.)-.V + .v'''^'''"''-^''^'''''''-^^''"'-''""''''>+(l+.')(.V)'. (28) 



From /°° ^)'(^)| dz < CO and 



\fi{z)-'-fi{j/ho)-'\dz< / \{fr'nu)\dudz<h^' \{fr'nz)\dz 

j/ho Jj/ha Jj/ho Jj/ha 

we infer that the approximation error in ( p7| i is indeed of order 0(/iq ^) and thus tends to zero. Notice that 
nh/ho tends to infinity. 

The computation of the integral over the positive axis is provided in Proposition[3]below. The conver- 
gence of the total variance as given in Theorem[3] using h 0, follows in the same way. 

Concerning the adaptive weights obtained from a preestimation step, we can work conditionally on 
them and by independence just assume to dispose of a deterministic sequence converging uniformly 
to Ej. By definition of the weights summing up to one the bias is (asymptotically) zero as in the oracle 
case. The variance on a block is correspondingly given by 

y{n) _ 2^j=iy^jk I ^jk 



r=l-', 



k 



(n) 

with Iji^' defined like Ijk, but in terms of the approximate values T,t^n instead of St. In view of the order 
n~^||<i>jfc||~^ ~ j/h, compare (|9|, the asymptotically dominating term in IJ^^ is n'^^\\^jk\\n^iilx''lY + 



Vxy)^ which is independent of Sj. Together with the uniform convergence of Sf „ this observation shows 

V - ^ikl = ^(lik) uniformly over 
uniformly and we conclude uniformly in k 

y^"^ = Vkil + 0(1)) with Vk:=[J2 Ir 



K^jfc'') ^ ~ ^jk\ ~ ^^-^jk) uniformly over all j and k. Consequently, we also have l/j^-* — Ijk\ = o{Ijk 

nh -, 



rk 



the oracle variance over a block. Pursuing the same calculus as in the oracle case, the rescaled total variance 
converges to the same integral. 

In the literature on nonparametric estimation methods for related and more general models, much math- 
ematical effort is put in the proof of (stable) central limit theorems. For our Gaussian models the conclusion 
of asymptotic normality is direct. We can apply a standard i. i. d. triangular central limit theorem like Corol- 
lary 3. 1 from |Hall & Heyde| ( |1980| l, verifying a Lyapunov condition with fourth moments. 

The following Proposition completes the proof of Theorem |3] For the computation of the Riemann 
integrals we use some concepts from complex analysis. 



Proposition 3. Consider the functions /i : C — > C generalizing the function (j28]l and /2 : C — > C with 

/2(z):=/2(p,a;z)=.V + 2cTW + (l + pV = A((^ ^) , J) ; z) , 

which depend on parameters p and positive a' , 77 . . For the improper integrals along the positive real line 
in the case a' > 0, p 7^ 0, the following identities hold true: 

^ -^^= .w/, .M/. sinf^(Arg(i-p)~^)V (29a) 



(29b) 
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where A and B are short expressions for the terms 

Arg(z), with Arg(2;) = arctan(Ini(z)/ Re(z))/or Re(z) > 0, denotes the argument of a complex number 
and we determine to take the root located in the upper half plane H = {z G C| Im(z) > 0} in ( |29b[ ) in the 
case that — B <{). 

For p = and strictly positive a ' ( and rj . ), the integrals yield 

dx — , (30a) 



f2{x) 4^3 



Proof. The meromorphic functions /j"^ :C— >C, /2^^ :C— >C have four simple poles in the complex 
plane, since /i, /2 each has four simple non-real zeros. We can apply a specific version of the residue 
theorem (cf. Theorem 7. 10 in Chapter III of |Freitag & Busam| ( |2005[ )) to evaluate the above improper real 
integrals. We restrict ourselves to the case p 7^ for which the solutions of ([29a| and (|29b| are not feasible 
using algebra programs or standard integral tables. 

We first give the proof of ( |29a| i for the simplified function f2^. The zeros of /2 are 



22;1,2,3,4 



exp(i7r/4) r^—- 



and are located symmetrically on a disk around the null in the complex plane. The residue theorem allows 
to calculate the integral J along the real line by the limit of a curve integral over a half-disk in the 
upper half plane. Since /2 is even on the real line and Z2;i and Z2;4 are the poles in the upper half plane, 
we obtain: 

/ -rrr = '^i- (R-SS (/2"^ 22;l) + Res (/a^^ Z2;4)) 
Jo J2\X) 

= 7ri (^(4cxp (i7r/4)7r^p + ia^ + 4exp (i37r/4)7r(p + 

- (4exp (i7r/4)7r yr^cr^ +4exp(i37r/4)7r(i- p)''/V^) 

= ^(-i)-v^((i-p)-v^-(i + p)-v^) 



In this proof we always use the unique square root in the upper half plane of complex numbers (and the 
usual definition for real numbers). 

The analysis for the general case /i is a bit more involved, since depending on the parameters p, a and 



the ratios TxlyTx'nY + "Txy- "Hy / y Tx'Hy + Txy *e zeros of fi. 

^l;l,2,3,4 = ±-^\/-A±^/A'-B, 
V 2 TT 

it holds true that either zi;i("-n-") and zi;4("- -") or zi;i("-H-") and 2;i;2("+ -") are located in the upper 
half plane. This role change dependent on whether A^ ~ B\s positive or negative is illustrated in Figurejs] 
in which the interesting factor appearing in the solution of the integral is depicted for a possible range of 
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Figure 5: Real and imaginary part of y^—A — \\/B — + \/ —A + iV B — M. 



values for A and S in a certain codomain of p, ct"^, cr^, -qx lr\Y for t]xy = 0. Using the above convention 
for square roots, the left-hand side of (|29b| yields 



/•OO • -1 



1 1 



\/2 - bVb 



^ 1/12^5 -sgn(yl2 _ 5 



In the first line "±" indicates that depending on the parameters there are two different solutions. As 
visualized in Figurejs] the right factor in the second line is purely real if B — A^ > and purely imaginary if 
B — A^ < 0. The expressions in the second and third line hence give the positive real solution in each case. 
In the case B ~ A'^ > Q, we can write the solution V^{B - A^y^l^B'^/^ cos Q Arg(A + i\/B - A^)^ 
similiarly to (|29a|i above. □ 
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